Librosa tutorial

Environments

We assume that you have already installed Anaconda.

If you don't have an environment, create one by following command:

conda create --name YOURNAME scipy jupyter ipython

(Replace YOURNAME by whatever you want to call the new environment.)

Then, activate the new environment

source activate YOURNAME

Installing librosa

Librosa can then be installed by the following [🔗]:

conda install -c conda-forge librosa

NOTE: Windows need to install audio decoding libraries separately. We recommend ffmpeg.

Test drive

Start Jupyter:

jupyter notebook

and open a new notebook.

Then, run the following:


In [ ]:
import librosa
print(librosa.__version__)

In [ ]:
y, sr = librosa.load(librosa.util.example_audio_file())
print(len(y), sr)

Documentation!

Librosa has extensive documentation with examples.

When in doubt, go to http://librosa.github.io/librosa/

Conventions

  • All data are basic numpy types
  • Audio buffers are called y
  • Sampling rate is called sr
  • The last axis is time-like:
      y[1000] is the 1001st sample
      S[:, 100] is the 101st frame of S
  • Defaults sr=22050, hop_length=512

Roadmap for today

  • librosa.core
  • librosa.feature
  • librosa.display
  • librosa.beat
  • librosa.segment
  • librosa.decompose

librosa.core

  • Low-level audio processes
  • Unit conversion
  • Time-frequency representations

To load a signal at its native sampling rate, use sr=None


In [ ]:
y_orig, sr_orig = librosa.load(librosa.util.example_audio_file(),
                     sr=None)
print(len(y_orig), sr_orig)

Resampling is easy


In [ ]:
sr = 22050

y = librosa.resample(y_orig, sr_orig, sr)

print(len(y), sr)

But what's that in seconds?


In [ ]:
print(librosa.samples_to_time(len(y), sr))

Spectral representations

Short-time Fourier transform underlies most analysis.

librosa.stft returns a complex matrix D.

D[f, t] is the FFT value at frequency f, time (frame) t.


In [ ]:
D = librosa.stft(y)
print(D.shape, D.dtype)

Often, we only care about the magnitude.

D contains both magnitude S and phase $\phi$.

$$ D_{ft} = S_{ft} \exp\left(j \phi_{ft}\right) $$

In [ ]:
import numpy as np

In [ ]:
S, phase = librosa.magphase(D)
print(S.dtype, phase.dtype, np.allclose(D, S * phase))

Constant-Q transforms

The CQT gives a logarithmically spaced frequency basis.

This representation is more natural for many analysis tasks.


In [ ]:
C = librosa.cqt(y, sr=sr)

print(C.shape, C.dtype)

Exercise 0

  • Load a different audio file
  • Compute its STFT with a different hop length

In [ ]:
# Exercise 0 solution

y2, sr2 = librosa.load(   )

D = librosa.stft(y2, hop_length=   )

librosa.feature

  • Standard features:
    • librosa.feature.melspectrogram
    • librosa.feature.mfcc
    • librosa.feature.chroma
    • Lots more...
  • Feature manipulation:
    • librosa.feature.stack_memory
    • librosa.feature.delta

Most features work either with audio or STFT input


In [ ]:
melspec = librosa.feature.melspectrogram(y=y, sr=sr)

# Melspec assumes power, not energy as input
melspec_stft = librosa.feature.melspectrogram(S=S**2, sr=sr)

print(np.allclose(melspec, melspec_stft))

librosa.display

  • Plotting routines for spectra and waveforms

  • Note: major overhaul coming in 0.5


In [ ]:
# Displays are built with matplotlib 
import matplotlib.pyplot as plt

# Let's make plots pretty
import matplotlib.style as ms
ms.use('seaborn-muted')

# Render figures interactively in the notebook
%matplotlib nbagg

# IPython gives us an audio widget for playback
from IPython.display import Audio

Waveform display


In [ ]:
plt.figure()
librosa.display.waveplot(y=y, sr=sr)

A basic spectrogram display


In [ ]:
plt.figure()
librosa.display.specshow(melspec, y_axis='mel', x_axis='time')
plt.colorbar()

Exercise 1

  • Pick a feature extractor from the librosa.feature submodule and plot the output with librosa.display.specshow
  • Bonus: Customize the plot using either specshow arguments or pyplot functions

In [ ]:
# Exercise 1 solution

X = librosa.feature.XX()

plt.figure()

librosa.display.specshow(    )

librosa.beat

  • Beat tracking and tempo estimation

The beat tracker returns the estimated tempo and beat positions (measured in frames)


In [ ]:
tempo, beats = librosa.beat.beat_track(y=y, sr=sr)
print(tempo)
print(beats)

Let's sonify it!


In [ ]:
clicks = librosa.clicks(frames=beats, sr=sr, length=len(y))

Audio(data=y + clicks, rate=sr)

Beats can be used to downsample features


In [ ]:
chroma = librosa.feature.chroma_cqt(y=y, sr=sr)
chroma_sync = librosa.feature.sync(chroma, beats)

In [ ]:
plt.figure(figsize=(6, 3))
plt.subplot(2, 1, 1)
librosa.display.specshow(chroma, y_axis='chroma')
plt.ylabel('Full resolution')
plt.subplot(2, 1, 2)
librosa.display.specshow(chroma_sync, y_axis='chroma')
plt.ylabel('Beat sync')

librosa.segment

  • Self-similarity / recurrence
  • Segmentation

Recurrence matrices encode self-similarity

R[i, j] = similarity between frames (i, j)

Librosa computes recurrence between k-nearest neighbors.


In [ ]:
R = librosa.segment.recurrence_matrix(chroma_sync)

In [ ]:
plt.figure(figsize=(4, 4))
librosa.display.specshow(R)

We can include affinity weights for each link as well.


In [ ]:
R2 = librosa.segment.recurrence_matrix(chroma_sync,
                                       mode='affinity',
                                       sym=True)

In [ ]:
plt.figure(figsize=(5, 4))
librosa.display.specshow(R2)
plt.colorbar()

Exercise 2

  • Plot a recurrence matrix using different features
  • Bonus: Use a custom distance metric

In [ ]:
# Exercise 2 solution

librosa.decompose

  • hpss: Harmonic-percussive source separation
  • nn_filter: Nearest-neighbor filtering, non-local means, Repet-SIM
  • decompose: NMF, PCA and friends

Separating harmonics from percussives is easy


In [ ]:
D_harm, D_perc = librosa.decompose.hpss(D)

y_harm = librosa.istft(D_harm)

y_perc = librosa.istft(D_perc)

In [ ]:
Audio(data=y_harm, rate=sr)

In [ ]:
Audio(data=y_perc, rate=sr)

NMF is pretty easy also!


In [ ]:
# Fit the model
W, H = librosa.decompose.decompose(S, n_components=16, sort=True)

In [ ]:
plt.figure(figsize=(6, 3))
plt.subplot(1, 2, 1), plt.title('W')
librosa.display.specshow(librosa.logamplitude(W**2), y_axis='log')
plt.subplot(1, 2, 2), plt.title('H')
librosa.display.specshow(H, x_axis='time')

In [ ]:
# Reconstruct the signal using only the first component
S_rec = W[:, :1].dot(H[:1, :])

y_rec = librosa.istft(S_rec * phase)

In [ ]:
Audio(data=y_rec, rate=sr)

Exercise 3

  • Compute a chromagram using only the harmonic component
  • Bonus: run the beat tracker using only the percussive component

Wrapping up